Skip to content

3634: Improve code quality and increase PHPStan level#30

Open
martin-augment wants to merge 2 commits intomainfrom
pr-3634-2026-01-26-20-12-37
Open

3634: Improve code quality and increase PHPStan level#30
martin-augment wants to merge 2 commits intomainfrom
pr-3634-2026-01-26-20-12-37

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

3634: To review by AI

@github-actions github-actions bot added the Php label Jan 26, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 26, 2026

Walkthrough

This pull request refactors the Apache Avro PHP library to improve code consistency and type safety. Changes include renaming properties and parameters from snake_case to camelCase throughout multiple classes, adding explicit type hints and return type declarations to method signatures, and updating PHPDoc annotations from @returns to @return. New public methods are introduced (such as getAliases() and fieldsByAlias()), and helper methods are added for data compression and decompression. PHPStan analysis level is increased from 3 to 6 with a baseline configuration to suppress specific type mismatch warnings. Internal variable naming is standardized to camelCase, and type annotations are expanded across utility classes and schema-related code.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Jan 26, 2026

Code Review: PR #30 - Improve code quality and increase PHPStan level

Summary

This PR makes significant improvements to the PHP codebase by modernizing type hints, improving naming conventions, and increasing PHPStan level to 6. Overall, this is a positive change that enhances code quality and maintainability.


✅ Strengths

1. Excellent Type System Modernization

  • Added comprehensive type hints for parameters and return types throughout
  • Proper use of nullable types (?AvroSchema)
  • Modern union types (int|string, string|AvroDuration)
  • Promoted constructor properties reduce boilerplate

2. Better Code Quality

  • Consistent camelCase naming (e.g., $writers_schema$writersSchema)
  • match expressions replace switch statements for better type safety and exhaustiveness checking
  • Fixed PHPDoc tags (@returns@return)
  • Improved GMP resource documentation (resource\GMP)

3. Cleaner Architecture

  • Extracted compression/decompression logic into private methods (gzCompress(), zstdCompress(), etc.)
  • Better separation of concerns in DataFile classes

4. PHPStan Level 6

  • Significant improvement in static analysis coverage
  • Catches more potential bugs at development time

⚠️ Concerns & Recommendations

1. Test Coverage

The PR modifies 891 lines across 34 files but only touches 3 test files with minimal changes.

Recommendations:

  • Add unit tests for newly extracted compression/decompression methods
  • Consider testing type coercion edge cases in AvroDebug.php (lines 194, 213, 221)
  • Verify all compression codecs still work correctly (especially the refactored ones)

2. PHPStan Baseline (148 lines)

The baseline suppresses 37 type errors, mostly schema type mismatches like:

Parameter #1 $writersSchema expects AvroArraySchema, AvroSchema given

This indicates an architectural issue where the type hierarchy doesn't match usage patterns.

Recommendations:

  • Create follow-up issues to address baselined errors
  • Consider introducing type guards or narrowing methods to properly narrow AvroSchema to specific types
  • Document why each baseline entry is acceptable

3. Potential Breaking Changes

These changes could break downstream code:

  • Return type changes (e.g., append() now returns void explicitly)
  • Constructor signature changes with strict types
  • Method signature changes may break class extensions

Recommendations:

  • Document breaking changes in release notes
  • Consider semantic versioning implications
  • Add migration guide for users extending library classes

4. Suspicious PHPStan Suppression

In lang/php/lib/DataFile/AvroDataIOReader.php:100-101:

/** @phpstan-ignore if.alwaysFalse */
if ($this->isEof()) {
    break;
}

PHPStan believes this condition is always false, but the developer disagrees.

Recommendation:

  • Investigate this suppression - it could indicate:
    • A logic error in the code
    • Incorrect PHPStan inference
    • Need for better type annotations
  • Add a comment explaining why this suppression is necessary

5. Type Casting in AvroDebug

The explicit (string) casts added are good defensive programming, but verify behavior with non-numeric inputs:

$x[] = str_pad((string) $db, 3, '0', STR_PAD_LEFT);

🔒 Security Considerations

No significant security concerns identified. The type safety improvements actually reduce security risks by preventing type confusion bugs.


🎯 Performance Considerations

Neutral to Positive:

  • match expressions are slightly faster than switch statements
  • Type hints enable JIT optimizations in PHP 8.1+
  • Extracted methods may have negligible function call overhead but improve maintainability

📋 Code Quality Assessment

Aspect Rating Notes
Type Safety ⭐⭐⭐⭐⭐ Excellent improvements
Naming Conventions ⭐⭐⭐⭐⭐ Consistent camelCase
Documentation ⭐⭐⭐⭐ Good PHPDoc, could document baseline
Test Coverage ⭐⭐⭐ Relies on existing tests, needs new ones
Breaking Changes ⭐⭐⭐ Minor concerns, needs documentation

✅ Verdict: Approve with Recommendations

This PR significantly improves code quality and is ready to merge with the understanding that:

  1. Follow-up work is needed to address the 37 baselined PHPStan errors
  2. Additional test coverage should be added (can be in a follow-up PR)
  3. Breaking changes should be documented in release notes
  4. The if.alwaysFalse suppression should be investigated

The pragmatic approach of using a baseline for incremental improvement is sound. The code is cleaner, safer, and more maintainable after these changes.


📝 Action Items

Before Merge:

  • Investigate the if.alwaysFalse suppression in AvroDataIOReader.php:100

Follow-up PRs:

  • Add unit tests for extracted compression methods
  • Create issues to track resolution of baselined PHPStan errors
  • Document breaking changes in CHANGELOG/release notes
  • Add migration guide for class extension scenarios

Great work on modernizing the codebase! 🎉

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the code quality and maintainability of the PHP Avro library by introducing comprehensive type hinting, adhering to stricter static analysis rules (PHPStan level 6), and refactoring several core components for better readability and error handling. Key changes include consistent variable naming, explicit type declarations for parameters and return values, and modularization of compression/decompression logic.

Highlights

  • Type Hinting and Strict Types: Extensive addition of parameter and return type declarations across numerous methods and properties in almost all PHP files. This includes scalar types, object types (e.g., \GMP), union types (e.g., int|string|\GMP), and array shapes (e.g., array<string, mixed>, list<string>). The declare(strict_types=1); directive was added to AvroIOSchemaMatchException.php.
  • Code Quality and Readability: Refactored variable names from snake_case to camelCase for consistency (e.g., $biginteger_mode to $bigIntegerMode, $datum_writer to $datumWriter). Docblock comments were updated to use @return instead of @returns and to remove redundant @param tags where type hints are already present. Compression/decompression logic in AvroDataIOReader.php and AvroDataIOWriter.php was extracted into dedicated private methods for better modularity and error handling.
  • PHPStan Level Increase: The PHPStan analysis level was increased from 3 to 6, and a new phpstan-baseline.neon file was introduced to manage existing ignorable errors, indicating a significant step towards stricter static analysis and improved code reliability.
  • Improved Error Handling: Enhanced error handling for compression and decompression operations (gzip, zstd, snappy, bzip2) in AvroDataIOReader.php and AvroDataIOWriter.php, providing more specific exceptions when these operations fail or required extensions are missing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Jan 26, 2026

🤖 Augment PR Summary

Summary: This PR focuses on improving the PHP implementation’s code quality and raising static analysis coverage.

Changes:

  • Added/strengthened scalar type hints, return types, and PHPDoc generics across core classes (IO, Datum, Schema, Protocol).
  • Standardized naming conventions (e.g., camelCase properties/variables) and cleaned up various docblocks.
  • Refactored Avro datafile reader/writer compression handling into dedicated (de)compression helpers with explicit failure checks.
  • Improved schema/protocol typing via PHPStan type aliases (import-type / phpstan-type) for better inference.
  • Updated enum duplicate-symbol reporting to produce a readable message.
  • Introduced a PHPStan baseline and increased PHPStan level to 6 (with adjusted settings) to tighten static analysis.
  • Adjusted tests for the updated APIs and improved assertion behavior (e.g., using fail()).

Technical Notes: The PHPStan config now includes the baseline, analyzes lib, and sets treatPhpDocTypesAsCertain: false to reduce false certainty from docblock types.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 4 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

case self::ERROR_SCHEMA:
case self::REQUEST_SCHEMA:
if (!($expected_schema instanceof AvroRecordSchema)) {
if (!$expected_schema instanceof AvroRecordSchema) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!$expected_schema instanceof AvroRecordSchema is affected by operator precedence (it’s parsed as (!$expected_schema) instanceof ...), so this guard will never trigger and non-record schemas could be treated as records.

The same ! ... instanceof precedence issue also appears in other locations.

Other Locations
  • lang/php/lib/Datum/AvroIODatumReader.php:231

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

foreach ($fieldSchema->fields() as $field) {
$field_name = $field->name();
if (!$json_val = $default_value[$field_name]) {
if (!$json_val = $defaultValue[$field_name]) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In readDefaultValue() for records, if (!$json_val = $defaultValue[$field_name]) treats valid falsy defaults (e.g., 0, false, '') as missing and will also raise an undefined index notice when the key doesn’t exist.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

break;
case self::SEEK_CUR:
if (0 > $this->current_index + $whence) {
if (0 > $this->currentIndex + $whence) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In seek()’s SEEK_CUR branch, the bounds check uses $whence instead of the relative $offset, so it can incorrectly allow negative positions or throw for valid seeks.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

if (!extension_loaded('snappy')) {
throw new AvroException('Please install ext-snappy to use snappy compression.');
}
$crc32 = unpack('N', substr((string) $compressed, -4))[1];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snappyUncompress() assumes $compressed is at least 4 bytes (substr(..., -4) + unpack(...)[1]); truncated/corrupt blocks can produce warnings and potentially a fatal error instead of a controlled exception.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

return $writersSchemaType === $readersSchemaType;
}

switch ($readers_schema_type) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primitive type promotion broken by early return

High Severity

The new primitive type check in schemasMatch returns false when writer and reader types differ, even when type promotion is valid per the Avro spec. For example, if a writer uses int and a reader uses long, both are primitives, so the condition passes, but since 'int' !== 'long', it returns false immediately. The type promotion logic at lines 185-208 (allowing int→long, long→float, float→double) is never reached, breaking schema evolution for primitive types.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lang/php/lib/Datum/AvroIODatumReader.php (1)

457-526: Handle falsy default values correctly in record defaults.
The current check treats valid defaults like 0, false, or "" as missing and can also emit undefined-index notices. Prefer array_key_exists to distinguish “missing” from “falsy.”

🐛 Proposed fix
-                foreach ($fieldSchema->fields() as $field) {
+                foreach ($fieldSchema->fields() as $field) {
                     $field_name = $field->name();
-                    if (!$json_val = $defaultValue[$field_name]) {
-                        $json_val = $field->default_value();
-                    }
+                    if (is_array($defaultValue) && array_key_exists($field_name, $defaultValue)) {
+                        $json_val = $defaultValue[$field_name];
+                    } else {
+                        $json_val = $field->default_value();
+                    }
 
                     $record[$field_name] = $this->readDefaultValue(
                         $field->type(),
                         $json_val
                     );
                 }
🤖 Fix all issues with AI agents
In `@lang/php/lib/Avro.php`:
- Around line 63-70: Update the docblock on the usesGmp() method to reference
the renamed property $bigIntegerMode (instead of Avro::$biginteger_mode);
mention the dependency on Avro::check64Bit() (or Avro::checkPlatform()) setting
Avro::$bigIntegerMode so the documentation matches the actual property name and
casing.

In `@lang/php/lib/DataFile/AvroDataIOReader.php`:
- Around line 214-221: The docblock for readBlockHeader() currently says
"@return int" but the method signature returns string|int; update the docblock
to "@return string|int" (and adjust the descriptive text if present) so the
PHPDoc matches the method signature, referencing the readBlockHeader() method
and its use of $this->decoder->readLong(); alternatively, if you want to
guarantee an int, change the signature to return int and cast the decoder result
to int where readLong() is returned.

In `@lang/php/lib/DataFile/AvroDataIOWriter.php`:
- Around line 86-96: The writers schema passed into AvroDataIOWriter is not
being applied to the injected datumWriter, leaving datumWriter potentially
unconfigured; after setting $this->metadata[AvroDataIO::METADATA_SCHEMA_ATTR]
and before writeHeader() in the AvroDataIOWriter initialization, assign the
passed $writersSchema to the datum writer (i.e., set the datumWriter's
writersSchema property or equivalent on $this->datumWriter) so
$this->datumWriter and $this->metadata stay in sync.

In `@lang/php/lib/Datum/AvroIODatumWriter.php`:
- Around line 136-140: The throw in AvroIODatumWriter is accessing
$writersSchema->type as a property while the rest of the class (e.g., earlier at
line 76) uses the method $writersSchema->type(); update the throw to call
$writersSchema->type() instead of accessing the property, and scan
AvroIODatumWriter for any other occurrences of ->type property access to make
them consistent with the method form.

In `@lang/php/lib/IO/AvroStringIO.php`:
- Around line 132-136: In the SEEK_CUR branch of AvroStringIO (case
self::SEEK_CUR) the boundary check incorrectly uses $whence instead of $offset;
update the check to use $offset so it reads the intended condition (e.g., verify
0 > $this->currentIndex + $offset) and keep the subsequent update
$this->currentIndex += $offset unchanged, ensuring seeks before the buffer start
are detected correctly.

In `@lang/php/lib/Protocol/AvroProtocol.php`:
- Around line 33-38: Update the phpstan array-shape alias
AvroProtocolDefinitionArray to mark the "types" and "messages" keys as optional
(use the nullable/optional `?` shape syntax) so it matches how realParse()
checks `array_key_exists("types", $avro)` and uses `$avro["messages"] ?? null`;
edit the AvroProtocolDefinitionArray type (and the similar shape around lines
71-75) so `types` and `messages` are optional in the shape, ensuring the static
type matches the runtime checks in realParse().

In `@lang/php/lib/Protocol/AvroProtocolMessage.php`:
- Around line 32-35: Update the PHPStan type alias for the Avro protocol message
to make response optional to match the constructor's behavior: locate the
AvroProtocolMessage class/type definitions (symbols
AvroProtocolMessageDefinitionArray and the other similar alias around lines
43-50) and change the shape so response is optional (e.g., use response?: string
or response: string|null) in both occurrences; ensure any dependent
phpstan-import-type references remain unchanged so the constructor's
array_key_exists('response', $avro) and the ?AvroSchema typed property are
consistent with the type.

In `@lang/php/lib/Schema/AvroEnumSchema.php`:
- Around line 94-98: Update the AvroEnumSchema::symbolIndex docblock to reflect
the tightened signature and new exception behavior: change/add `@param` string
$symbol and `@return` int, add `@throws` \TypeError to indicate non-string arguments
will raise a TypeError, and keep/clarify `@throws` AvroException for cases where
the symbol is not present; include a short note that callers must pass a string
to avoid TypeError (this clarifies the public API BC change).

In `@lang/php/lib/Schema/AvroRecordSchema.php`:
- Around line 64-65: The destructuring assigns an unused variable $x from
nameAndNamespace(); update the assignment in AvroRecordSchema (around the call
to nameAndNamespace()) to discard the first element (e.g., use an
underscore-prefixed variable like $_ or a two‑slot list with the first slot
ignored) so only $namespace is used, then pass $namespace into
self::parseFields($fields, $namespace, $schemata); keep parseFields and
nameAndNamespace() unchanged.

In `@lang/php/test/InterOpTest.php`:
- Around line 23-38: In InterOpTest::setUp, guard the file_get_contents call
before assigning to the typed property $projectionJson and before parsing into
$projection: call file_get_contents(AVRO_INTEROP_SCHEMA) into a local variable,
check if the result === false (or !is_string), and if so call
$this->fail("Unable to read AVRO_INTEROP_SCHEMA: file missing or unreadable") or
throw a clear exception; otherwise assign the string to $this->projectionJson
and then call AvroSchema::parse($this->projectionJson) to populate
$this->projection.
🧹 Nitpick comments (16)
lang/php/lib/Schema/AvroArraySchema.php (1)

62-68: PHPDoc tag correction looks good, but return type union may be inconsistent.

The @returns@return fix is correct. However, the declared return type AvroName|AvroSchema appears inconsistent with the actual implementation: the $items property (line 32) is typed as AvroSchema, and the method simply returns $this->items. The AvroName variant in the union type seems unreachable.

Consider whether the return type should be just AvroSchema, or if the property type needs updating to match the declared union.

lang/php/lib/IO/AvroStringIO.php (1)

82-84: Stale TODO comment references old naming convention.

The TODO comment on line 82 still mentions current_index but the property has been renamed to currentIndex.

📝 Proposed fix
     /**
-     * `@todo` test for fencepost errors wrt updating current_index
+     * `@todo` test for fencepost errors wrt updating currentIndex
      * `@param` mixed $len
      * `@return` string bytes read from buffer
      */
lang/php/lib/AvroDebug.php (1)

57-65: PHPDoc references old parameter name.

The docblock on line 58 still references $debug_level but the parameter has been renamed to $debugLevel.

📝 Proposed fix
     /**
-     * `@return` bool true if the given $debug_level is equivalent
+     * `@return` bool true if the given $debugLevel is equivalent
      *                  or more verbose than than the current debug level
      *                  and false otherwise.
      */
lang/php/lib/AvroGMP.php (3)

78-103: Missing explicit return type declaration.

The shiftLeft method has @return \GMP in its PHPDoc but lacks an explicit return type in the method signature. For consistency with the private helper methods (e.g., gmp_0xfs(): \GMP), consider adding the return type.

Also, line 79 has a typo: @interal should be @internal.

♻️ Proposed fix
     /**
-     * `@interal` Only works up to shift 63 (doesn't wrap bits around).
+     * `@internal` Only works up to shift 63 (doesn't wrap bits around).
      * `@param` int $shift number of bits to shift left
      * `@return` \GMP $g shifted left
      */
-    public static function shiftLeft(int|string|\GMP $g, int $shift)
+    public static function shiftLeft(int|string|\GMP $g, int $shift): \GMP

105-111: Missing explicit return type declaration.

The gmpTwosComplement method has @return \GMP in PHPDoc but lacks an explicit return type declaration for consistency.

♻️ Proposed fix
     /**
      * `@return` \GMP resource 64-bit two's complement of input.
      */
-    public static function gmpTwosComplement(int|string|\GMP $g)
+    public static function gmpTwosComplement(int|string|\GMP $g): \GMP

113-141: Missing explicit return type declaration.

The shiftRight method has @return \GMP in PHPDoc but lacks an explicit return type declaration for consistency.

♻️ Proposed fix
     /**
      * Arithmetic right shift
      * `@param` int $shift number of bits to shift right
      * `@return` \GMP $g shifted right $shift bits
      */
-    public static function shiftRight(int|string|\GMP $g, int $shift)
+    public static function shiftRight(int|string|\GMP $g, int $shift): \GMP
lang/php/lib/DataFile/AvroDataIOReader.php (2)

94-94: Use strict comparison for consistency.

Line 94 uses loose comparison (==) while other codec checks in this file use strict comparison (===). For consistency and to avoid type coercion issues, use strict comparison here as well.

Proposed fix
-            if (0 == $this->blockCount) {
+            if (0 === $this->blockCount) {

274-291: Potential issue with unpack() return value.

unpack() can return false on failure. While unlikely with controlled input, accessing [1] without validating the return could cause an error.

Defensive fix
     private function snappyUncompress(string $compressed): string
     {
         if (!extension_loaded('snappy')) {
             throw new AvroException('Please install ext-snappy to use snappy compression.');
         }
-        $crc32 = unpack('N', substr((string) $compressed, -4))[1];
+        $unpacked = unpack('N', substr($compressed, -4));
+        if (false === $unpacked) {
+            throw new AvroException('snappy uncompression failed - could not read CRC32.');
+        }
+        $crc32 = $unpacked[1];
-        $datum = snappy_uncompress(substr((string) $compressed, 0, -4));
+        $datum = snappy_uncompress(substr($compressed, 0, -4));
 
         if (false === $datum) {
             throw new AvroException('snappy uncompression failed.');
         }
 
         if ($crc32 !== crc32($datum)) {
             throw new AvroException('snappy uncompression failed - crc32 mismatch.');
         }
 
         return $datum;
     }

Note: The (string) casts on lines 279-280 are redundant since $compressed is already typed as string.

lang/php/lib/Schema/AvroRecordSchema.php (1)

152-167: Consider memoization for fieldsByAlias().

Unlike fieldsHash() which memoizes the result, fieldsByAlias() rebuilds the hash on every call. If this method is called frequently, consider applying the same memoization pattern.

Proposed memoization pattern
+    /**
+     * `@var` null|array<string, AvroField> map of field aliases to field objects.
+     */
+    private ?array $fieldsByAliasHash = null;
+
     /**
      * `@return` array<string, AvroField>
      */
     public function fieldsByAlias(): array
     {
+        if (is_null($this->fieldsByAliasHash)) {
             $hash = [];
             foreach ($this->fields as $field) {
                 if ($field->hasAliases()) {
                     foreach ($field->getAliases() as $a) {
                         $hash[$a] = $field;
                     }
                 }
             }
+            $this->fieldsByAliasHash = $hash;
+        }
 
-        return $hash;
+        return $this->fieldsByAliasHash;
     }
lang/php/lib/Datum/AvroIODatumWriter.php (3)

143-159: Inconsistent parameter naming.

The writeBytes method still uses $writers_schema (snake_case) while other methods use $writersSchema (camelCase). Update for consistency with the rest of the file.

Proposed fix
-    private function writeBytes(AvroSchema $writers_schema, string $datum, AvroIOBinaryEncoder $encoder): void
+    private function writeBytes(AvroSchema $writersSchema, string $datum, AvroIOBinaryEncoder $encoder): void
     {
-        $logicalType = $writers_schema->logicalType();
+        $logicalType = $writersSchema->logicalType();

178-193: Incomplete camelCase rename in writeMap.

The local variable $datum_count on line 184 should be renamed to $datumCount for consistency with the camelCase refactoring in this PR.

Proposed fix
     private function writeMap(AvroMapSchema $writersSchema, array $datum, AvroIOBinaryEncoder $encoder): void
     {
-        $datum_count = count($datum);
-        if ($datum_count > 0) {
-            $encoder->writeLong($datum_count);
+        $datumCount = count($datum);
+        if ($datumCount > 0) {
+            $encoder->writeLong($datumCount);

252-271: Incomplete camelCase rename in writeUnion.

The local variables $datum_schema_index and $datum_schema on lines 254-255 should be renamed to camelCase for consistency.

Proposed fix
     private function writeUnion(AvroUnionSchema $writersSchema, mixed $datum, AvroIOBinaryEncoder $encoder): void
     {
-        $datum_schema_index = -1;
-        $datum_schema = null;
+        $datumSchemaIndex = -1;
+        $datumSchema = null;
         foreach ($writersSchema->schemas() as $index => $schema) {
             if (AvroSchema::isValidDatum($schema, $datum)) {
-                $datum_schema_index = $index;
-                $datum_schema = $schema;
+                $datumSchemaIndex = $index;
+                $datumSchema = $schema;
 
                 break;
             }
         }
 
-        if (is_null($datum_schema)) {
+        if (is_null($datumSchema)) {
             throw new AvroIOTypeException($writersSchema, $datum);
         }
 
-        $encoder->writeLong($datum_schema_index);
-        $this->writeValidatedData($datum_schema, $datum, $encoder);
+        $encoder->writeLong($datumSchemaIndex);
+        $this->writeValidatedData($datumSchema, $datum, $encoder);
     }
lang/php/lib/DataFile/AvroDataIO.php (2)

132-165: Inconsistent parameter naming in openFile.

The parameter $file_path uses snake_case while other parameters use camelCase ($schemaJson). Consider renaming to $filePath for consistency.

Proposed fix
     public static function openFile(
-        string $file_path,
+        string $filePath,
         string $mode = AvroIO::READ_MODE,
         ?string $schemaJson = null,
         string $codec = self::NULL_CODEC
     ): AvroDataIOReader|AvroDataIOWriter {

170-173: Simplify null check in isValidCodec.

Since $codec is typed as ?string, the is_string($codec) check is essentially a null check. Consider using !is_null($codec) or simply $codec !== null for clarity.

Proposed fix
     public static function isValidCodec(?string $codec): bool
     {
-        return is_string($codec) && in_array($codec, self::validCodecs());
+        return null !== $codec && in_array($codec, self::validCodecs(), true);
     }

Note: Also added true as the third parameter to in_array for strict comparison.

lang/php/lib/Datum/AvroIOBinaryDecoder.php (2)

161-164: Minor: Remove empty line in skipNull.

The empty line in the method body is unnecessary.

Proposed fix
     public function skipNull(): void
     {
-
     }

218-221: Consider suppressing unused parameter warning.

The $writersSchema parameter in skipEnum is unused but likely kept for API consistency with other skip methods. Add a PHPDoc annotation to suppress the warning and document the intent.

Proposed fix
+    /**
+     * `@param` AvroSchema $writersSchema kept for API consistency with other skip methods
+     * `@SuppressWarnings`(PHPMD.UnusedFormalParameter)
+     */
     public function skipEnum(AvroSchema $writersSchema, AvroIOBinaryDecoder $decoder): void
     {
         $decoder->skipInt();
     }

Comment on lines +63 to 70
* @return bool true if the PHP GMP extension is used and false otherwise.
* @internal Requires Avro::check64Bit() (exposed via Avro::checkPlatform())
* to have been called to set Avro::$biginteger_mode.
*/
public static function usesGmp(): bool
{
return self::GMP_BIGINTEGER_MODE === self::$biginteger_mode;
return self::GMP_BIGINTEGER_MODE === self::$bigIntegerMode;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the docblock to reference the renamed property.

Line 65 still references Avro::$biginteger_mode but the property was renamed to $bigIntegerMode.

📝 Suggested fix
     /**
      * `@return` bool true if the PHP GMP extension is used and false otherwise.
      * `@internal` Requires Avro::check64Bit() (exposed via Avro::checkPlatform())
-     *           to have been called to set Avro::$biginteger_mode.
+     *           to have been called to set Avro::$bigIntegerMode.
      */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* @return bool true if the PHP GMP extension is used and false otherwise.
* @internal Requires Avro::check64Bit() (exposed via Avro::checkPlatform())
* to have been called to set Avro::$biginteger_mode.
*/
public static function usesGmp(): bool
{
return self::GMP_BIGINTEGER_MODE === self::$biginteger_mode;
return self::GMP_BIGINTEGER_MODE === self::$bigIntegerMode;
}
/**
* `@return` bool true if the PHP GMP extension is used and false otherwise.
* `@internal` Requires Avro::check64Bit() (exposed via Avro::checkPlatform())
* to have been called to set Avro::$bigIntegerMode.
*/
public static function usesGmp(): bool
{
return self::GMP_BIGINTEGER_MODE === self::$bigIntegerMode;
}
🤖 Prompt for AI Agents
In `@lang/php/lib/Avro.php` around lines 63 - 70, Update the docblock on the
usesGmp() method to reference the renamed property $bigIntegerMode (instead of
Avro::$biginteger_mode); mention the dependency on Avro::check64Bit() (or
Avro::checkPlatform()) setting Avro::$bigIntegerMode so the documentation
matches the actual property name and casing.

Comment on lines +214 to 221
* @return int length in bytes of the block.
*/
private function readBlockHeader(): string|int
{
$this->block_count = $this->decoder->readLong();
$this->blockCount = $this->decoder->readLong();

return $this->decoder->readLong();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Return type mismatch between docblock and signature.

The docblock states @return int but the method signature declares string|int. Either update the docblock to match the actual return type or narrow the implementation if only int is ever returned.

Proposed fix if int is always returned
     /**
      * Reads the block header (which includes the count of items in the block
      * and the length in bytes of the block)
      * `@return` int length in bytes of the block.
      */
-    private function readBlockHeader(): string|int
+    private function readBlockHeader(): int
     {
-        $this->blockCount = $this->decoder->readLong();
+        $this->blockCount = (int) $this->decoder->readLong();
 
-        return $this->decoder->readLong();
+        return (int) $this->decoder->readLong();
     }
🤖 Prompt for AI Agents
In `@lang/php/lib/DataFile/AvroDataIOReader.php` around lines 214 - 221, The
docblock for readBlockHeader() currently says "@return int" but the method
signature returns string|int; update the docblock to "@return string|int" (and
adjust the descriptive text if present) so the PHPDoc matches the method
signature, referencing the readBlockHeader() method and its use of
$this->decoder->readLong(); alternatively, if you want to guarantee an int,
change the signature to return int and cast the decoder result to int where
readLong() is returned.

Comment on lines +86 to 96
if ($writersSchema) {
if (!AvroDataIO::isValidCodec($codec)) {
throw new AvroDataIOException(
sprintf('codec %s is not supported', $codec)
);
}

$this->sync_marker = self::generateSyncMarker();
$this->syncMarker = self::generateSyncMarker();
$this->metadata[AvroDataIO::METADATA_CODEC_ATTR] = $this->codec = $codec;
$this->metadata[AvroDataIO::METADATA_SCHEMA_ATTR] = (string) $writers_schema;
$this->metadata[AvroDataIO::METADATA_SCHEMA_ATTR] = (string) $writersSchema;
$this->writeHeader();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, check the AvroIODatumWriter class structure
echo "=== AvroIODatumWriter class definition ==="
rg -n "class AvroIODatumWriter" lang/php/lib/Datum/AvroIODatumWriter.php -A 20

echo -e "\n=== writersSchema property in AvroIODatumWriter ==="
rg -n "writersSchema|writers_schema" lang/php/lib/Datum/AvroIODatumWriter.php

echo -e "\n=== AvroDataIOWriter class context ==="
rg -n "class AvroDataIOWriter" lang/php/lib/DataFile/AvroDataIOWriter.php -A 15

echo -e "\n=== datumWriter initialization in AvroDataIOWriter ==="
rg -n "datumWriter" lang/php/lib/DataFile/AvroDataIOWriter.php -B 2 -A 2

Repository: martin-augment/avro

Length of output: 5183


🏁 Script executed:

#!/bin/bash
# Find usages and tests of AvroDataIOWriter constructor
echo "=== Files that instantiate AvroDataIOWriter ==="
rg -n "new AvroDataIOWriter" --type php

echo -e "\n=== Test files for AvroDataIOWriter ==="
fd ".*Test.*" --type f | grep -i "writer\|dataio" | head -20

# Check test files for context on how datumWriter is passed
echo -e "\n=== Look for test patterns ==="
rg -n "AvroDataIOWriter" lang/php/tests/ -B 2 -A 2 | head -50

Repository: martin-augment/avro

Length of output: 893


🏁 Script executed:

#!/bin/bash
# Examine the actual usages
echo "=== AvroDataIO.php constructor usage ==="
rg -n "new AvroDataIOWriter" lang/php/lib/DataFile/AvroDataIO.php -B 5 -A 5

echo -e "\n=== write_read.php example ==="
cat -n lang/php/examples/write_read.php | sed -n '60,80p'

echo -e "\n=== StringIOTest.php ==="
cat -n lang/php/test/StringIOTest.php | sed -n '70,95p'

echo -e "\n=== Check how datumWriter is created before passing ==="
rg -n "new AvroIODatumWriter" lang/php -B 2 -A 2

Repository: martin-augment/avro

Length of output: 6019


Sync $writersSchema to datumWriter for consistency and robustness.

The existing file path (line 104) explicitly sets $this->datumWriter->writersSchema from the file metadata, but the new file path (lines 86-96) doesn't sync the passed $writersSchema to the injected datumWriter instance. While all current callers pre-configure the datumWriter with schema before passing it, this creates inconsistency and makes the code fragile if someone passes an unconfigured writer with $writersSchema. Add this pattern to match the existing precedent:

Suggested change
             $this->syncMarker = self::generateSyncMarker();
             $this->metadata[AvroDataIO::METADATA_CODEC_ATTR] = $this->codec = $codec;
             $this->metadata[AvroDataIO::METADATA_SCHEMA_ATTR] = (string) $writersSchema;
+            if ($writersSchema instanceof AvroSchema) {
+                $this->datumWriter->writersSchema = $writersSchema;
+            } else {
+                $this->datumWriter->writersSchema = AvroSchema::parse($writersSchema);
+            }
             $this->writeHeader();
🤖 Prompt for AI Agents
In `@lang/php/lib/DataFile/AvroDataIOWriter.php` around lines 86 - 96, The writers
schema passed into AvroDataIOWriter is not being applied to the injected
datumWriter, leaving datumWriter potentially unconfigured; after setting
$this->metadata[AvroDataIO::METADATA_SCHEMA_ATTR] and before writeHeader() in
the AvroDataIOWriter initialization, assign the passed $writersSchema to the
datum writer (i.e., set the datumWriter's writersSchema property or equivalent
on $this->datumWriter) so $this->datumWriter and $this->metadata stay in sync.

Comment on lines 136 to 140
throw new AvroException(sprintf(
'Unknown type: %s',
$writers_schema->type
$writersSchema->type
));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent property access vs method call.

Line 138 accesses $writersSchema->type as a property, while line 76 uses $writersSchema->type() as a method. Use the method consistently.

Proposed fix
             default:
                 throw new AvroException(sprintf(
                     'Unknown type: %s',
-                    $writersSchema->type
+                    $writersSchema->type()
                 ));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
throw new AvroException(sprintf(
'Unknown type: %s',
$writers_schema->type
$writersSchema->type
));
}
throw new AvroException(sprintf(
'Unknown type: %s',
$writersSchema->type()
));
}
🤖 Prompt for AI Agents
In `@lang/php/lib/Datum/AvroIODatumWriter.php` around lines 136 - 140, The throw
in AvroIODatumWriter is accessing $writersSchema->type as a property while the
rest of the class (e.g., earlier at line 76) uses the method
$writersSchema->type(); update the throw to call $writersSchema->type() instead
of accessing the property, and scan AvroIODatumWriter for any other occurrences
of ->type property access to make them consistent with the method form.

Comment on lines 132 to +136
case self::SEEK_CUR:
if (0 > $this->current_index + $whence) {
if (0 > $this->currentIndex + $whence) {
throw new AvroIOException('Cannot seek before beginning of file.');
}
$this->current_index += $offset;
$this->currentIndex += $offset;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: Wrong variable used in SEEK_CUR boundary check.

Line 133 uses $whence instead of $offset for the boundary check. Since $whence equals self::SEEK_CUR (constant value 1), this check is incorrect and will not properly detect seeks before the beginning of the buffer.

🐛 Proposed fix
         case self::SEEK_CUR:
-            if (0 > $this->currentIndex + $whence) {
+            if (0 > $this->currentIndex + $offset) {
                 throw new AvroIOException('Cannot seek before beginning of file.');
             }
             $this->currentIndex += $offset;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
case self::SEEK_CUR:
if (0 > $this->current_index + $whence) {
if (0 > $this->currentIndex + $whence) {
throw new AvroIOException('Cannot seek before beginning of file.');
}
$this->current_index += $offset;
$this->currentIndex += $offset;
case self::SEEK_CUR:
if (0 > $this->currentIndex + $offset) {
throw new AvroIOException('Cannot seek before beginning of file.');
}
$this->currentIndex += $offset;
🤖 Prompt for AI Agents
In `@lang/php/lib/IO/AvroStringIO.php` around lines 132 - 136, In the SEEK_CUR
branch of AvroStringIO (case self::SEEK_CUR) the boundary check incorrectly uses
$whence instead of $offset; update the check to use $offset so it reads the
intended condition (e.g., verify 0 > $this->currentIndex + $offset) and keep the
subsequent update $this->currentIndex += $offset unchanged, ensuring seeks
before the buffer start are detected correctly.

Comment on lines +33 to +38
* @phpstan-type AvroProtocolDefinitionArray array{
* types: AvroSchemaDefinitionArray,
* protocol: string,
* namespace: string,
* messages: array<string, AvroProtocolMessageDefinitionArray>
* }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd lang/php/lib/Protocol && wc -l AvroProtocol.php

Repository: martin-augment/avro

Length of output: 83


🏁 Script executed:

cd lang/php/lib/Protocol && cat -n AvroProtocol.php | sed -n '1,100p'

Repository: martin-augment/avro

Length of output: 4058


🏁 Script executed:

cd lang/php/lib/Protocol && grep -n "realParse" AvroProtocol.php | head -20

Repository: martin-augment/avro

Length of output: 245


Mark types and messages as optional in the PHPStan array-shape.

The realParse() method checks array_key_exists("types", $avro) before accessing types (line 78) and uses $avro["messages"] ?? null for messages (line 83), making both keys optional. The current type definition incorrectly marks them as required, which can cause PHPStan false positives. Update the shape at lines 33-38 to mark these keys with ?:

Proposed adjustment
- * `@phpstan-type` AvroProtocolDefinitionArray array{
- *     types: AvroSchemaDefinitionArray,
- *     protocol: string,
- *     namespace: string,
- *     messages: array<string, AvroProtocolMessageDefinitionArray>
- * }
+ * `@phpstan-type` AvroProtocolDefinitionArray array{
+ *     protocol: string,
+ *     namespace: string,
+ *     types?: AvroSchemaDefinitionArray,
+ *     messages?: array<string, AvroProtocolMessageDefinitionArray>
+ * }

Also applies to: 71-75

🤖 Prompt for AI Agents
In `@lang/php/lib/Protocol/AvroProtocol.php` around lines 33 - 38, Update the
phpstan array-shape alias AvroProtocolDefinitionArray to mark the "types" and
"messages" keys as optional (use the nullable/optional `?` shape syntax) so it
matches how realParse() checks `array_key_exists("types", $avro)` and uses
`$avro["messages"] ?? null`; edit the AvroProtocolDefinitionArray type (and the
similar shape around lines 71-75) so `types` and `messages` are optional in the
shape, ensuring the static type matches the runtime checks in realParse().

Comment on lines +32 to +35
/**
* @phpstan-import-type AvroSchemaDefinitionArray from AvroSchema
* @phpstan-type AvroProtocolMessageDefinitionArray array{request: AvroSchemaDefinitionArray, response: string}
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file structure and content
cat -n lang/php/lib/Protocol/AvroProtocolMessage.php | head -60

Repository: martin-augment/avro

Length of output: 2467


🏁 Script executed:

# Continue reading the constructor to see response handling
cat -n lang/php/lib/Protocol/AvroProtocolMessage.php | sed -n '60,85p'

Repository: martin-augment/avro

Length of output: 687


Make response optional in the PHPStan message shape.
The constructor uses array_key_exists('response', $avro) to conditionally process the response, allowing it to be absent from the array. The property is also nullable (?AvroSchema), but the type definition currently requires it.

Proposed adjustment
- * `@phpstan-type` AvroProtocolMessageDefinitionArray array{request: AvroSchemaDefinitionArray, response: string}
+ * `@phpstan-type` AvroProtocolMessageDefinitionArray array{request: AvroSchemaDefinitionArray, response?: string}

Also applies to: 43-50

🤖 Prompt for AI Agents
In `@lang/php/lib/Protocol/AvroProtocolMessage.php` around lines 32 - 35, Update
the PHPStan type alias for the Avro protocol message to make response optional
to match the constructor's behavior: locate the AvroProtocolMessage class/type
definitions (symbols AvroProtocolMessageDefinitionArray and the other similar
alias around lines 43-50) and change the shape so response is optional (e.g.,
use response?: string or response: string|null) in both occurrences; ensure any
dependent phpstan-import-type references remain unchanged so the constructor's
array_key_exists('response', $avro) and the ?AvroSchema typed property are
consistent with the type.

Comment on lines +94 to 98
* @throws AvroException
* @return int the index of the given $symbol in the enum schema
*/
public function symbolIndex($symbol)
public function symbolIndex(string $symbol): int
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -nP '\bsymbolIndex\s*\(' -g'*.php' -B2 -A2

Repository: martin-augment/avro

Length of output: 921


🏁 Script executed:

# Search more comprehensively for symbolIndex calls
rg 'symbolIndex' -g'*.php' -n

# Also check if there are any tests related to AvroEnumSchema
fd -e php -e phpt | xargs grep -l 'AvroEnumSchema\|symbolIndex' 2>/dev/null | head -20

Repository: martin-augment/avro

Length of output: 443


🏁 Script executed:

# Check AvroIODatumReader.php for symbolIndex usage
rg -n 'symbolIndex' lang/php/lib/Datum/AvroIODatumReader.php -B2 -A2

# Also check the implementation in AvroEnumSchema.php
cat -n lang/php/lib/Schema/AvroEnumSchema.php | head -110 | tail -30

Repository: martin-augment/avro

Length of output: 1149


🏁 Script executed:

# Check git log for recent changes to symbolIndex
git log --oneline -n 20 -- lang/php/lib/Schema/AvroEnumSchema.php

# Check if there's a CHANGELOG or version file
fd -e md -e txt | grep -iE '(changelog|version|release)' | head -10

Repository: martin-augment/avro

Length of output: 636


🏁 Script executed:

# Get the diff for the commit that touched AvroEnumSchema.php
git show fc8392377 -- lang/php/lib/Schema/AvroEnumSchema.php

Repository: martin-augment/avro

Length of output: 3966


🏁 Script executed:

# Check if AvroEnumSchema.php existed in previous commits
git log --follow --all --oneline -- lang/php/lib/Schema/AvroEnumSchema.php

# Check the parent of fc8392377 to see the state before
git show fc8392377^:lang/php/lib/Schema/AvroEnumSchema.php 2>&1 | head -20

Repository: martin-augment/avro

Length of output: 501


🏁 Script executed:

# Get the previous version from the commit before fc8392377
git show 3f0a45a28:lang/php/lib/Schema/AvroEnumSchema.php 2>&1 | grep -A10 "symbolIndex"

Repository: martin-augment/avro

Length of output: 365


Document public API BC change: symbolIndex(string $symbol): int.
Parameter type was tightened from untyped to string and return type added. Non-string arguments now throw TypeError instead of AvroException, but the docblock still claims @throws AvroException. Internal call sites are compatible (only one found: AvroIODatumWriter.php:237). Update documentation to clarify the strict type requirement and the new exception behavior for external consumers.

🤖 Prompt for AI Agents
In `@lang/php/lib/Schema/AvroEnumSchema.php` around lines 94 - 98, Update the
AvroEnumSchema::symbolIndex docblock to reflect the tightened signature and new
exception behavior: change/add `@param` string $symbol and `@return` int, add
`@throws` \TypeError to indicate non-string arguments will raise a TypeError, and
keep/clarify `@throws` AvroException for cases where the symbol is not present;
include a short note that callers must pass a string to avoid TypeError (this
clarifies the public API BC change).

Comment on lines 64 to 65
[$x, $namespace] = $name->nameAndNamespace();
$this->fields = self::parseFields($fields, $namespace, $schemata);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused variable $x.

The variable $x from the destructuring assignment is never used. Consider using an underscore prefix to indicate intentional discard, or restructure to only extract the needed value.

Proposed fix
-        [$x, $namespace] = $name->nameAndNamespace();
+        [, $namespace] = $name->nameAndNamespace();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[$x, $namespace] = $name->nameAndNamespace();
$this->fields = self::parseFields($fields, $namespace, $schemata);
[, $namespace] = $name->nameAndNamespace();
$this->fields = self::parseFields($fields, $namespace, $schemata);
🧰 Tools
🪛 PHPMD (2.15.0)

64-64: Avoid unused local variables such as '$x'. (undefined)

(UnusedLocalVariable)

🤖 Prompt for AI Agents
In `@lang/php/lib/Schema/AvroRecordSchema.php` around lines 64 - 65, The
destructuring assigns an unused variable $x from nameAndNamespace(); update the
assignment in AvroRecordSchema (around the call to nameAndNamespace()) to
discard the first element (e.g., use an underscore-prefixed variable like $_ or
a two‑slot list with the first slot ignored) so only $namespace is used, then
pass $namespace into self::parseFields($fields, $namespace, $schemata); keep
parseFields and nameAndNamespace() unchanged.

Comment on lines +23 to +38
use Apache\Avro\AvroIO;
use Apache\Avro\DataFile\AvroDataIO;
use Apache\Avro\IO\AvroFile;
use Apache\Avro\Schema\AvroSchema;
use PHPUnit\Framework\Attributes\DataProvider;
use PHPUnit\Framework\TestCase;

class InterOpTest extends TestCase
{
private string $projection_json;
private string $projectionJson;
private AvroSchema $projection;

public function setUp(): void
{
$interop_schema_file_name = AVRO_INTEROP_SCHEMA;
$this->projection_json = file_get_contents($interop_schema_file_name);
$this->projection = AvroSchema::parse($this->projection_json);
$this->projectionJson = file_get_contents($interop_schema_file_name);
$this->projection = AvroSchema::parse($this->projectionJson);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard file_get_contents() before assigning to a typed property.
If the schema file is missing, file_get_contents() returns false, causing a TypeError without a clear test failure message. Add an explicit check.

🛠️ Proposed fix
-        $this->projectionJson = file_get_contents($interop_schema_file_name);
-        $this->projection = AvroSchema::parse($this->projectionJson);
+        $contents = file_get_contents($interop_schema_file_name);
+        if ($contents === false) {
+            $this->fail("Could not read schema file: {$interop_schema_file_name}");
+        }
+        $this->projectionJson = $contents;
+        $this->projection = AvroSchema::parse($this->projectionJson);
🤖 Prompt for AI Agents
In `@lang/php/test/InterOpTest.php` around lines 23 - 38, In InterOpTest::setUp,
guard the file_get_contents call before assigning to the typed property
$projectionJson and before parsing into $projection: call
file_get_contents(AVRO_INTEROP_SCHEMA) into a local variable, check if the
result === false (or !is_string), and if so call $this->fail("Unable to read
AVRO_INTEROP_SCHEMA: file missing or unreadable") or throw a clear exception;
otherwise assign the string to $this->projectionJson and then call
AvroSchema::parse($this->projectionJson) to populate $this->projection.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily focuses on enhancing type safety and code clarity across various Avro PHP library files. Key changes include updating PHPDoc @returns tags to @return, adding explicit type hints to method parameters and return types, and renaming numerous variables and properties from snake_case to camelCase for consistency. Specifically, compression/decompression logic in AvroDataIOReader and AvroDataIOWriter was refactored into dedicated private helper methods, complete with robust error handling and extension checks. Additionally, redundant @param PHPDoc tags were removed where type hints were already present in function signatures, and declare(strict_types=1) was added to AvroIOSchemaMatchException.php. Review comments highlighted the removal of some @param tags that still corresponded to parameters in the function signature, and raised concerns about removing explicit is_string() checks in AvroStringIO's constructor and write method, suggesting potential clarity issues or compatibility concerns in non-strict environments, despite the introduction of strict types.


/**
* @param string $str
* @param string $joiner string used to join
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The @param string $str tag was removed from the PHPDoc for hexString, but the $str parameter is still present in the function signature. This creates a mismatch between the documentation and the actual function definition, which can lead to confusion.

     * @param string $str

* @param string $str
* @returns string[] array of hex representation of each byte of $str
* @return string[] array of hex representation of each byte of $str
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The @param string $str tag was removed from the PHPDoc for hexArray, but the $str parameter is still present in the function signature. Please ensure the PHPDoc accurately reflects the function's parameters.

     * @param string $str

* @param string $format format to represent bytes
* @returns string[] array of each byte of $str formatted using $format
* @return string[] array of each byte of $str formatted using $format
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The @param string $str tag was removed from the PHPDoc for bytesArray, but the $str parameter is still present in the function signature. It's important to keep the PHPDoc synchronized with the code for clarity.

     * @param string $str

* @returns string of bytes of $str represented in decimal format
* @return string of bytes of $str represented in decimal format
* @uses decArray()
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The @param string $str tag was removed from the PHPDoc for decString, but the $str parameter is still present in the function signature. Please restore the @param tag to maintain accurate documentation.

     * @param string $str

public static function decArray($str)
public static function decArray(string $str): array
{
return self::bytesArray($str, '%3d');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The @param string $str tag was removed from the PHPDoc for decArray, but the $str parameter is still present in the function signature. This discrepancy should be corrected.

     * @param string $str

/**
* @param \GMP $g resource
* @return \GMP resource 64-bit two's complement of input.
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The @param \GMP $g resource tag was removed from the PHPDoc for gmpTwosComplement, but the $g parameter is still present in the function signature. Please restore the @param tag with the correct type hint int|string|\GMP.

     * @param int|string|\GMP $g

* @param int $shift number of bits to shift right
* @returns resource $g shifted right $shift bits
* @return \GMP $g shifted right $shift bits
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The @param int|resource|string $g tag was removed from the PHPDoc for shiftRight, but the $g parameter is still present in the function signature. Please restore the @param tag with the correct type hint int|string|\GMP.

     * @param int|string|\GMP $g

Comment on lines +100 to 101
/** @phpstan-ignore if.alwaysFalse */
if ($this->isEof()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The /** @phpstan-ignore if.alwaysFalse */ annotation is present here. While it might be necessary for PHPStan, it indicates a potential logical path that is marked as unreachable by static analysis but might be reachable at runtime, or a misunderstanding of the isEof() behavior. It would be beneficial to either clarify why this ignore is needed or refactor the logic to avoid it if possible.

Comment on lines 46 to +51
*/
public function __construct(string $str = '')
{
$this->is_closed = false;
$this->string_buffer = '';
$this->current_index = 0;

if (is_string($str)) {
$this->string_buffer .= $str;
} else {
throw new AvroIOException(
sprintf('constructor argument must be a string: %s', gettype($str))
);
}
$this->isClosed = false;
$this->stringBuffer = $str;
$this->currentIndex = 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The original constructor included a runtime check for $str being a string and threw an AvroIOException if it wasn't. While declare(strict_types=1); and the type hint string $str enforce this at the engine level, removing the explicit if (is_string($str)) check removes a defensive layer that might be useful in environments where strict types are not always enforced or for clearer error messages specific to AvroIOException.

Comment on lines 69 to +70

throw new AvroIOException(
sprintf(
'write argument must be a string: (%s) %s',
gettype($bytes),
var_export($bytes, true)
)
);
return $this->appendStr($bytes);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the constructor, the write method previously had an is_string($bytes) check. Although the parameter is now strictly typed, removing this explicit check might reduce the clarity of error messages if a non-string somehow reaches this point (e.g., via reflection or older PHP versions without strict types).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants